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Abstract 

The implementation of global optimization algorithms, using the arithmetic 
of infinity, is considered. A relatively simple version of implementation is 
proposed for the algorithms that possess the introduced property of strong 
homogeneity. It is shown that the P-algorithm and the one-step Bayesian 
algorithm are strongly homogeneous. 
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1. Introduction 

Global optimization problems are considered where the computation of 
objective function values, using the standard computer arithmetic, is prob- 
lematic because of either underflows or overflows. A perspective means for 
solving such problems is the arithmetic of infinity [HI [8]. Besides fun- 
damentally new problems of minimization of functions whose computation 
involves infinite or infinitesimal values, the arithmetic of infinity can be also 
very helpful for the cases where the computation of objective function values 
is challenging because of the involvement of numbers differing in many orders 
of magnitude. For example, in some problems of statistical inference [T5| [16] . 
the values of operands, involved in the computation of objective functions, 
differ by more than a factor of 10 200 . 

The arithmetic of infinity can be applied to the optimization of chal- 
lenging objective functions in two ways. First, the optimization algorithm 
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can be implemented in the arithmetic of infinity. Second, the arithmetic of 
infinity can be applied to scale the objective function values to be suitable 
for processing by a conventionally implemented optimization algorithm. The 
second case is simpler to apply, since the arithmetic of infinity should be 
applied only to the scaling of function values. If both implementation ver- 
sions of the algorithm perform identically with respect to the generation of 
sequences of points where the objective function values are computed, the 
algorithm is called strongly homogeneous. In the present paper, we show 
that both implementation versions - of the P-algorithm and of the one-step 
Bayesian algorithm - are strongly homogeneous. 

To be more precise, let us consider two objective functions f(x) and h(x), 
x G A C R d differing only in scales of function values, i.e. h(x) = af(x) + b 
where a and b are constants that can assume not only finite but also infinite 
and infinitesimal values expressed by numerals introduced in [61 Ej. In its 
turn, f(x) is defined by using the traditional finite arithmetic. The sequences 
of points generated by an algorithm, when applied to these functions, are 
denoted by Xi, i = 1, 2, . . . , and Vi, i = 1, 2, ... , respectively. The algorithm 
that generates the identical sequences x^ = i = 1, 2, . . . , is called strongly 
homogeneous. A weaker property of algorithms is considered in [H |9] , where 
the algorithms that generate the identical sequences for the functions f(x) 
and h(x) = f(x) + b are called homogeneous. Since the proper scaling of 
function values by translation alone is not always possible, in the present 
paper we consider invariance of the optimization results with respect to a 
more general (affine) transformation of the objective function values. 

2. Description of the P-algorithm 

Let us consider the minimization problem 



where the multimodality of the objective function f(x) is expected. Although 
the properties of the feasible region are not essential in a further analysis, for 
the sake of explicitness, A is assumed to be a hyper-rectangle. For the ar- 
guments justifying the construction of global optimization algorithms using 
statistical models of objective functions, we refer to 0, H21 [EE]. Global opti- 
mization algorithms based on statistical models implement the ideas of the 
theory of rational decision making under uncertainty [10]. The P-algorithm 
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is constructed in [T3] stating the rationality axioms in the situation of selec- 
tion of a point of current computation of the value of f(x); it follows from 
the axioms that a point should be selected where the probability to improve 
the current best value is maximal. 

To implement the P-algorithm, Gaussian stochastic functions are used 
mainly because of their computational advantages; however such type of 
statistical models is justified axiomatically and by the results of a psycho- 
metric experiment [lOj [HJ [13] . Application for a statistical model of a non- 
Gaussian stochastic function would imply at least serious implementation 
difficulties. Let be the Gaussian stochastic function with mean value 
fi, variance a 2 , and correlation function p(-, •). The choice of the correlation 
function normally is based on the supposed properties of the aimed objec- 
tive functions, and the properties of the corresponding stochastic function, 
e.g. frequently used correlation functions are p(x{,Xj) = exp(— c\\x{ — Xj\\), 
p(xi, Xj) = exp(— c| \xi — Xj\ | 2 ). The parameters \i and a 2 should be estimated 
using a sample of the objective function values. 

Let yi = f(xi) be the function values computed during the previous n 
minimization steps. By the P-algorithm [TQl EE] the next function value is 
computed at the point of maximum probability to overpass the aspiration 
level y on : 



Since is the Gaussian stochastic function, the maximization in (|2]) can 
be reduced to the maximization of 



where m n (x\xi, yi) and s 2 (x|xj, ^) denote the conditional mean and condi- 
tional variance of with respect to = y%, i — l,...,n,. The explicit 
formulae of m n (x\xi, yi) and (x\xi, yi) are presented below since they will 
be needed in a further analysis 



x n+ i = arg maxP{((i) < y on \£{xi) = Vi, i 



1, ...,n}. 



(2) 
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3. Evaluation of the influence of scaling on the search by the P- 
algorithm 



To evaluate the influence of data scaling on the whole optimization pro- 
cess, two objective functions are considered: f(x) and <p(x) — a ■ f(x) + b, 
where a and b are constants. Let us assume that the first n function values 
were computed for both functions at the same points (xj, i = 1, ...,n). The 
next points of computation of the values of /(•) and </>(•) are denoted by x n+ i 
and v n+ \. We are interested in the strong homogeneity of the P-algorithm, 
i.e. in the equality x n+ i = v n+ ±. 

The parameters of the stochastic function, estimated using the same 
method but different function values, normally are different. The estimates 
of fi and a 2 , obtained using the data (xi,yi = f(xi), i = l,...,n) and 
(xi,Zi = (p(xi), i = l,...,n), are denoted as fi, a 2 and fi,a 2 , respectively. 
It is assumed that ft = afi + b and a 2 = a 2 a 2 ; as shown below, this natural 
assumption is satisfied for the two most frequently used estimators. 

Obviously, the unbiased estimates of fi and of a 2 , ft = t Y^i z %-, an d o 2 = 
T-j-j- Yliifi^ z i) 2 > satisfy the assumptions made. Although those estimates are 
well justified only for independent observations, they sometimes (especially 
when only a small number (k) of observations is available) are used also 
for rough estimation of the parameters \x and of a 2 despite the correlation 
between the {z{\. 

The maximum likelihood estimates also satisfy the assumptions: 

faa) = argmax (27r)n/2|£|1/2ffW exp ^ ~ 2 j ,(5) 

where y = (yi, . . . , y n ) T , and / is the n dimensional unit vector. 

It is easy to show that the maximum likelihood estimates implied by ^ 
are equal to 

~ _ Si=l Si=l z iP{ x ii x j) /g\ 

i((y-A0S" 1 (y-A0 T )- (7) 



a 2 



It follows from ^ and Q that fl = aft + 6, and a 2 = a 2 a 2 correspondingly. 

The aspiration levels are defined depending on the scales of function val- 
ues: y on = min y { - ea, z on = min z { - so. 

i=l,...,n i=l,...,n 
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Theorem 1. The P-algorithm, based on the Gaussian model with estimated 
parameters, is strongly homogeneous. 

Proof. According to the definition of Vk+i the following equalities are valid 



v n+ i = arg max 



min Zi — so — (fi + (z± — fi, . . . , z n — /i)£ _1 Y r ) 

i=l,...,n . , 

= arg max . . (8) 

- TS-!T T ) 

Taking into account the relation between Z{ and yi and the corresponding 
relations between the estimates of \x and a, equalities ^ can be extended as 
follows 

min ayi - aea - a(y x - fi,...,y n - fi)Y,~ 1 T T 

i=l, ...,n 

v n+ i = arg max 



*zA aay/(l - TS-!T T ) 

min yi- sa - (yx - fi, . . . ,y n - //)£ _1 T T 

i=l, ...,n 

= arg max . 

*eA ^(1 -T£- 1 T T ) 

Von ^niS^^iiVi) /n\ 

= arg max — - = x n+1 . (9) 

xeA s n [x\Xi,yi) 

The equality between v n+ \ and x n+ i means that the sequence of points gen- 
erated by the P-algorithm is invariant with respect to the scaling of the 
objective function values. The strong homogeneity of the P-algorithm is 
proven. □ 

As shown in [21 E] , the P-algorithm and the radial basis function algo- 
rithm are equivalent under very general assumptions. Therefore the state- 
ment on the strong homogeneity of the P-algorithm is also valid for the radial 
basis function algorithm. 



4. Evaluation of the influence of scaling on the search by the one- 
step Bayesian algorithm 

Statistical models of objective functions are also used to construct Bayesian 
algorithms (H E]. Let a Gaussian stochastic function be chosen for the 
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statistical model as in Section [2j An implementable version of the Bayesian 
algorithm is the so called one-step Bayesian algorithm defined as follows: 



x n+1 = argmaxE{max(?/ on - f (x),0)|f(xi) = y it % = 1, . . . ,n}. (10) 

Theorem 2. The one-step Bayesian algorithm, based on the Gaussian model 
with estimated parameters, is strongly homogeneous. 

Proof. The value of the objective function is computed by the one-step 



Bayesian algorithm at the point of maximum average improvement (10). 



The formula of conditional mean in ( 10 ) can be rewritten as follows 



E{max(y on - £(x),0)\£(xi) = y u i = 1, . . . , n} 



Von 



(yon - t)p(t\m n (x\xi,yi), s n (x\xi,yi))dt, (11) 



where p(t\fi, a 2 ) denotes the Gaussian probability density with the mean 
value \x and variance a 2 . For simplicity, we use in this formula and here- 
inafter the traditional symbol oo. Obviously, when one starts to work in 
the framework of the infinite arithmetic [HI E], it should be substituted by 
an appropriate infinite number that has been defined a priori by the chosen 
statistical model. 

Integration by parts in (fill) results in the following formula 



E{max(y on - £(x),Q)\£(xi) = y h i = 1, . . . , n} = 

yon-m n (x\x i ,y i ) 
. . . / »r»OI*i.Vi) 

= s n (x\x i ,y i ) / U(t)dt, (12) 

J ~ oo 

where IT(t) is the Laplace integral: II(t) = £~ f_ exp(— ^)dr. From the 
formulae Q, d9l), the equalities 

yon ^n(x|Xj, y{) Z on lTL ri {x\Xi 1 Zi) 



s n {x\xi,yi) s n (x\xi^ z^) 

S n (x|Xj, Zi) QjS n (x\Xi 1 ?/i), 

follow implying the invariance of the sequence x±, X2, ... , generated by the 
one-step Bayesian algorithm with respect to the scaling of values of the ob- 
jective function. The strong homogeneity of the one-step Bayesian algorithm 
is proven. □ 
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5. Strong homogeneity is not a universal property of global opti- 
mization algorithms 



Although the invariance of the whole optimization process with respect 
to affine scaling of objective function values seems very natural, not all global 
optimization algorithms are strongly homogeneous. For example, the rather 
popular algorithm DIRECT [3] is not strongly homogeneous. We are not 
going to investigate in detail the properties of DIRECT related to the scaling 
of objective function values. Instead an example is presented contradicting 
the necessary conditions of strong homogeneity. 

For the sake of simplicity let us consider the one-dimension version of 
DIRECT. Let the feasible region (interval) be partitioned into subinter- 
vals [dj, bi], i = l,...,n. The objective function values computed at the 
points Q = (a; + bi)/2 are supposed positive, /(q) > 0; denote f min = 
min{/(ci), . . . , f(c n )}. A j-th subinterval is said to be potentially optimal if 
there exists a constant L > such that 

f( Cj )-LAj < f(a)-LAi, Vi = l,...,n, (13) 

fipj) — fmin £ | fmin\i (-^) 

where A, = (bi — a»)/2, and e is a constant defining the requested relative 
improvement, < e < 1. All potentially optimal subintervals are subdivided 
at the current iteration. 

Let us consider the iteration where the potentially optimal j-th subin- 
terval is not the longest one. Then f(cj) < /(q) for all c$ where Aj = A$. 
Otherwise there exists a constant L such that 

L > max f{ f-{ {Ci) , (15) 

L < mm — — — , 16 

~ i: Aj<Aj Aj - Aj K ' 

L > (f(Cj)- fmin + S\ f min\)/ 



The values f(ci) and A, corresponding to the minimum in (16) are de- 
noted as / + and A + correspondingly, i.e. 

ficA-ficA f + -f(ci) . . 

i: Aj<A t Aj - Aj A+ - Aj v ' 
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Let the values of the function <p(x) = f(x) + 5 be computed at the points 
Ci, and assume that the following inequality 5 > 5f/e is valid, where 



h = (f + - /!'V)) A . A, A; " f{cj) + (1 - e)f min . (19) 
For the data related to <p(x) the following inequality holds: 

L_ = (0(Cj) - (j)min + e\^ min \)/Aj 

> if(cj) - /m<n)/Aj + (e/ min + 5 //Aj) 

f + - f_M _ 

~ A^a"-^+> 

and a constant L satisfying the inequalities L_ < L < L + can not exist. 
Therefore the j-th subinterval for the function 4>(x) is not potentially optimal 
because necessary conditions (analogous to (16) and (17) for the function 
f(x)) are not satisfied. 



6. Numerical Example 

To demonstrate the strong homogeneity of the P-algorithm an exam- 
ple of one dimensional optimization is considered. For a statistical model 
the stationary Gaussian stochastic function with correlation function p(t) = 
exp(— 5i) is chosen. Let the values of the first objective function (say f(x)) 
computed at the points (0, 0.2, 0.5, 0.9, 1) be equal to (-0.8, -0.9, -0.65, -0.85, 
-0.55), and the values of the second objective function (say <f>(x)) be equal to 
(0, -0.4, 0.6, -0.2, 0.99). The graphs of the conditional mean and conditional 
standard deviation for both sets of data are presented in Figure [Tj In the 
section of Figure [T] showing the conditional means, the horizontal lines are 
drawn at the levels y o4: and z oi correspondingly. 

In spite of the obvious difference in the data, the functions expressing 
the probability of improvement for both cases coincide. Therefore, their 
maximizers which define the next points of function evaluations also coincide. 
This coincidence is implied by the strong homogeneity of the P-algorithm and 
the following relation: 4>(x) = af(x) + b, where the values of a, b up to five 
decimal digits are equal to a = 3.9765,6 = 3.1804. 
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Figure 1: An example of data used for planning the current iteration of the P-algorithm 
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7. Conclusions 



Both the P-algorithm and the one-step Bayesian algorithm are strongly 
homogeneous. The optimization results by these algorithms are invariant 
with respect to affine scaling of values of the objective function. The imple- 
mentations of these algorithms using the conventional computer arithmetic 
combined with the scaling of function values, using the arithmetic of infinity, 
are applicable to the objective functions with either infinite or infinitesimal 
values. The optimization results, obtained in this way, would be identical 
with the results obtained applying the implementations of the algorithms in 
the arithmetic of infinity. 
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